Search CORE

81 research outputs found

Global disease monitoring and forecasting with Wikipedia

Author: Del Valle Sara Y.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Priedhorsky Reid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 15/07/2014
Field of study

Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data such as social media and search queries are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with

r^2

up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein and adjust novelty claims accordingly; revise title; various revisions for clarit

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

FigShare

Google Health Trends performance reflecting dengue incidence for the Brazilian states

Author: Generous Nicholas
Manore Carrie A.
Martinez Kaitlyn
Osthus Dave
Parikh Nidhi
Romero-Alvarez Daniel
Valle Sara del
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/06/2020
Field of study

This work is licensed under a Creative Commons Attribution 4.0 International License.Background Dengue fever is a mosquito-borne infection transmitted by Aedes aegypti and mainly found in tropical and subtropical regions worldwide. Since its re-introduction in 1986, Brazil has become a hotspot for dengue and has experienced yearly epidemics. As a notifiable infectious disease, Brazil uses a passive epidemiological surveillance system to collect and report cases; however, dengue burden is underestimated. Thus, Internet data streams may complement surveillance activities by providing real-time information in the face of reporting lags. Methods We analyzed 19 terms related to dengue using Google Health Trends (GHT), a free-Internet data-source, and compared it with weekly dengue incidence between 2011 to 2016. We correlated GHT data with dengue incidence at the national and state-level for Brazil while using the adjusted R squared statistic as primary outcome measure (0/1). We used survey data on Internet access and variables from the official census of 2010 to identify where GHT could be useful in tracking dengue dynamics. Finally, we used a standardized volatility index on dengue incidence and developed models with different variables with the same objective. Results From the 19 terms explored with GHT, only seven were able to consistently track dengue. From the 27 states, only 12 reported an adjusted R squared higher than 0.8; these states were distributed mainly in the Northeast, Southeast, and South of Brazil. The usefulness of GHT was explained by the logarithm of the number of Internet users in the last 3 months, the total population per state, and the standardized volatility index. Conclusions The potential contribution of GHT in complementing traditional established surveillance strategies should be analyzed in the context of geographical resolutions smaller than countries. For Brazil, GHT implementation should be analyzed in a case-by-case basis. State variables including total population, Internet usage in the last 3 months, and the standardized volatility index could serve as indicators determining when GHT could complement dengue state level surveillance in other countries

KU ScholarWorks

Epidemiological data challenges: planning for a more robust future through data standards

Author: Daughton Ashlynn R.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Khalsa Hari
Priedhorsky Reid
Tasseff Byron
Velappan Nileena
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Comment: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comment

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Forecasting the 2013--2014 Influenza Season using Wikipedia

Author: Del Valle Sara Y.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Hickmann Kyle S.
Hyman James M.
Priedhorsky Reid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 03/11/2014
Field of study

Infectious diseases are one of the leading causes of morbidity and mortality around the world; thus, forecasting their impact is crucial for planning an effective response strategy. According to the Centers for Disease Control and Prevention (CDC), seasonal influenza affects between 5% to 20% of the U.S. population and causes major economic impacts resulting from hospitalization and absenteeism. Understanding influenza dynamics and forecasting its impact is fundamental for developing prevention and mitigation strategies. We combine modern data assimilation methods with Wikipedia access logs and CDC influenza like illness (ILI) reports to create a weekly forecast for seasonal influenza. The methods are applied to the 2013--2014 influenza season but are sufficiently general to forecast any disease outbreak, given incidence or case count data. We adjust the initialization and parametrization of a disease model and show that this allows us to determine systematic model bias. In addition, we provide a way to determine where the model diverges from observation and evaluate forecast accuracy. Wikipedia article access logs are shown to be highly correlated with historical ILI records and allow for accurate prediction of ILI data several weeks before it becomes available. The results show that prior to the peak of the flu season, our forecasting method projected the actual outcome with a high probability. However, since our model does not account for re-infection or multiple strains of influenza, the tail of the epidemic is not predicted well after the peak of flu season has past.Comment: Second version. In previous version 2 figure references were compiling wrong due to error in latex sourc

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

The Biosurveillance Analytics Resource Directory (BARD): Facilitating the Use of Epidemiological Models for Infectious Disease Surveillance

Author: Abeyta Esteban
Althouse Ben
Burkom Howard
Castro Lauren
Daughton Ashlynn
Del Valle Sara Y
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Hyman James M
Kiang Richard
Margevicius Kristen J
Morse Andrew P
Pancerella Carmen M
Pullum Laura
Ramanathan Arvind
Schlegelmilch Jeffrey
Scott Aaron
Taylor-McCabe Kirsten J
Vespignani Alessandro
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Epidemiological modeling for infectious disease is important for disease management and its routine implementation needs to be facilitated through better description of models in an operational context. A standardized model characterization process that allows selection or making manual comparisons of available models and their results is currently lacking. A key need is a universal framework to facilitate model description and understanding of its features. Los Alamos National Laboratory (LANL) has developed a comprehensive framework that can be used to characterize an infectious disease model in an operational context. The framework was developed through a consensus among a panel of subject matter experts. In this paper, we describe the framework, its application to model characterization, and the development of the Biosurveillance Analytics Resource Directory (BARD; http://brd.bsvgateway.org/brd/), to facilitate the rapid selection of operational models for specific infectious/communicable diseases. We offer this framework and associated database to stakeholders of the infectious disease modeling field as a tool for standardizing model description and facilitating the use of epidemiological models

University of Liverpool Repository

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Defending Our Public Biological Databases as a Global Critical Infrastructure

Author: Christina L. Ting
Christopher Oehmen
Corey M. Hudson
Curtis Johnson
Emilie Purvine
Eric Merkley
Gary Xie
Jacob Caswell
Jason D. Gans
Karen Taylor
Kristin Omberg
Murray Wolinsky
Nicholas Generous
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2019
Field of study

Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly “retrofitted” mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity

Directory of Open Access Journals

Results from the centers for disease control and prevention's predict the 2013-2014 Influenza Season Challenge

Author: Allen Christopher
Alper David
Aman Susan
Anil Kumar V. S.
Aslam Anoshã
Bakach Iurii
Barrett Chris
BASAGNI Stefano
Biggerstaff Matthew
Bisset Keith
Broniatowski David
Brooks Logan
Brownstein John S.
Butler Patrick
Chakraborty Prithwish
Chandra Priyadarshini
Chen Jiangzhuo
Del Valle Sara Y.
Deshpande Alina
Dredze Mark
Eggo Rosalind
Eubank Stephen
Fairchild Geoffrey
Farrow David
Finelli Lyn
Fox Spencer
Fung Isaac Chun Hai
Gambhir Manoj
Generous Nicholas
GESUALDO Francesco
Goldstein Ed
Hao Yi
Henderson Jette
Hickman Kyle S.
Hickmann Kyle S.
Hyman James M.
Hyun Sangwon
Karspeck Alicia
Kaup Hemchandra
Khadivi Pejman
Krishnan Ramesh
Laskowski Kathy
Lewis Bryan
Lipsitch Marc
Lum Kristian
Madhavan Satish
Marathe Madhav
Markar Ashirwad
Mekaru Sumiko R.
Meyers Lauren Ancel
Nagel Anna
Nsoesie Elaine O.
Pashley Bryanne
Paul Michael
PERRA NICOLA
Priedhorsky Reid
Ramakrishnan Anurekha
Ramakrishnan Naren
Rosenfeld Roni
Scarpino Sam
Schaible Braydon J.
Scott James
Sexton Jessica K.
Shaman Jeffrey
Singh Bismark
Srinivasan Ravi
STILO GIOVANNI
Tibshirani Ryan J.
Tozzi Alberto E.
Tse Zion Tsz Ho
Tsou Ming Hsiang
VELARDI Paola
Vespignani Alessandro
Yang Wan
Ying Yuchen
Zhang Qian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Early insights into the timing of the start, peak, and intensity of the influenza season could be useful in planning influenza prevention and control activities. To encourage development and innovation in influenza forecasting, the Centers for Disease Control and Prevention (CDC) organized a challenge to predict the 2013-14 Unites States influenza season. Methods: Challenge contestants were asked to forecast the start, peak, and intensity of the 2013-2014 influenza season at the national level and at any or all Health and Human Services (HHS) region level(s). The challenge ran from December 1, 2013-March 27, 2014; contestants were required to submit 9 biweekly forecasts at the national level to be eligible. The selection of the winner was based on expert evaluation of the methodology used to make the prediction and the accuracy of the prediction as judged against the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). Results: Nine teams submitted 13 forecasts for all required milestones. The first forecast was due on December 2, 2013; 3/13 forecasts received correctly predicted the start of the influenza season within one week, 1/13 predicted the peak within 1 week, 3/13 predicted the peak ILINet percentage within 1 %, and 4/13 predicted the season duration within 1 week. For the prediction due on December 19, 2013, the number of forecasts that correctly forecasted the peak week increased to 2/13, the peak percentage to 6/13, and the duration of the season to 6/13. As the season progressed, the forecasts became more stable and were closer to the season milestones. Conclusion: Forecasting has become technically feasible, but further efforts are needed to improve forecast accuracy so that policy makers can reliably use these predictions. CDC and challenge contestants plan to build upon the methods developed during this contest to improve the accuracy of influenza forecasts. © 2016 The Author(s)

Archivio della ricerca- Università di Roma La Sapienza

Parametric Uncertainty in Intra-Herd Foot-and-Mouth Disease Epidemiological Models

Author: Generous Eric Nicholas
Publication venue: University of Illinois at Chicago Library
Publication date: 23/03/2013
Field of study

Epidemiological models that simulate the spread of Foot-and-Mouth Disease within a herd are the foundation of decision support tools used by governments to help advise and inform strategy to combat outbreaks. Contact transmission data used to parameterize these models, contrary to assumption, contain a significant amount of variability and uncertainty. The implications of this finding suggest that the resultant model output might not accurately simulate the spread of an outbreak. If this is true, the potential impact due to uncertainty inherent to the decision support tools used by governments might be significant

University of Illinois at Chicago: Journals@UIC

PubMed Central

Epi Archive: Automated Synthesis of Global Notifiable Disease Data

Author: Cordova Sergio Rene
Generous Nicholas
Kkalsa Hari S.
Publication venue: University of Illinois at Chicago Library
Publication date: 30/05/2019
Field of study

ObjectiveAutomatically collect and synthesize global notifiable disease data and make it available to humans and computers. Provide the data on the web and within the Biosurveillance Ecosystem (BSVE) as a novel data stream. These data have many applications including improving the prediction and early warning of disease events.IntroductionGovernment reporting of notifiable disease data is common and widespread, though most countries do not report in a machine-readable format. This is despite the WHO International Health Regulations stating that “[e]ach State Party shall notify WHO, by the most efficient means of communication available.” 1Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health. While most nations likely store incident data in a machine-readable format, governments can be hesitant to share data openly for a variety of reasons that include technical, political, economic, and motivational2.A survey conducted by LANL of notifiable disease data reporting in over fifty countries identified only a few websites that report data in a machine-readable format. The majority (>70%) produce reports as PDF files on a regular basis. The bulk of the PDF reports present data in a structured tabular format, while some report in natural language or graphical charts.The structure and format of PDF reports change often; this adds to the complexity of identifying and parsing the desired data. Not all websites publish in English, and it is common to find typos and clerical errors.LANL has developed a tool, Epi Archive, to collect global notifiable disease data automatically and continuously and make it uniform and readily accessible.MethodsA survey of the national notifiable disease reporting systems is periodically conducted notating how the data are reported and in what formats. We determined the minimal metadata that is required to contextualize incident counts properly, as well as optional metadata that is commonly found.The development of software to regularly ingest notifiable disease data and make it available involves three to four main steps: scraping, detecting, parsing and persisting.Scraping: we examine website design and determine reporting mechanisms for each country/website, as well as what varies across the reporting mechanisms. We then design and write code to automate the downloading of data for each country. We store all artifacts presented as files (PDF, XLSX, etc.) in their original form, along with appropriate metadata for parsing and data provenance.Detecting: This step is required when parsing structured non-machine-readable data, such as tabular data in PDF files. We combine the Nurminen methodology of PDF table detection with in-house heuristics to find the desired data within PDF reports3.Parsing: We determine what to extract from each dataset and parse these data into uniform data structures, correctly accommodating the variations in metadata (e.g., time interval definitions) and the various human languages.Persisting: We store the data in the Epi Archive database and make it available on the internet and through the BSVE. The data is persisted into a structured and normalized SQL database.ResultsEpi Archive currently contains national and/or subnational notifiable disease data from thirty-nine nations. When a user accesses the Epi Archive site, they are able to peruse, chart and download data by country, subregion, disease and time interval. Access to a cached version of the original artifacts (e.g. PDF files), a link to the source and additional metadata is also available through the user interface. Finally, to ensure machine-readability, the data from Epi Archive can be reached through a REST API. http://epiarchive.bsvgateway.org/ConclusionsLANL, as part of a currently funded DTRA effort, is automatically and continually collecting global notifiable disease data. While thirty-nine nations are in production, more are being brought online in the near future. These data are already being utilized and have many applications, including improving the prediction and early warning of disease events.References[1] WHO International Health Regulations, edition 3. http://apps.who.int/iris/bitstream/10665/246107/1/9789241580496-eng.pdf[2] van Panhuis WG, Paul P, Emerson C, et al. A systematic review of barriers to data sharing in public health. BMC Public Health. 2014. 14:1144. doi:10.1186/1471-2458-14-1144[3] Nurminen, Anssi. "Algorithmic extraction of data in tables in PDF documents." (2013).

University of Illinois at Chicago: Journals@UIC